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Abstract 

We provide a new general theorem for multivariate normal ap- 
proximation on convex sets. The theorem is formulated in terms of 
a multivariate extension of Stein couplings. We apply the results to 
a homogeneity test in dense random graphs and to prove multivariate 
asymptotic normality for certain doubly indexed permutation statis- 
tics. 



1 INTRODUCTION 

Let W and Z be ci-dimensional random vectors, d ^ 1, where Z has standard 
(i-dimensional Gaussian distribution. We are concerned with bounding the 
quantity 



d c {Jf(W),Sf(Z)) 



sup \P(W e A) 



P(ZeA)\, 



(1.1) 



where A denotes the collection of all the convex sets in R a! . 

Our main tool is Stein's method for the multivariate normal distribu- 
tion, which has already bee n used to obt ain bounds on ([Lip , the two main 
contributions co ming from | Gotzd (119911) for sum s of in dependent random 
vector s (see also Bhattacharva and Holmes] ( 20ld )). and Rinott and Rotar 



(|l99fih for sums of dependent random vectors that allow for a certain decom- 
position. Most other contributions on multivariate normal approximation 
via St ein's method have focused on smooth functions: see e.g. Goldstein and 
Rinott (|l996l h iRaid (|2004h and lReinert and Rollinl (|2009h . 



The main aim of this article is to improve the results of lRinott and Rotar 7 



(Il996h in two i mportant ways. Fi r stly, w e remove a logarithmic factor in the 
error bound of Rinott and Rotar 7 ( 19961 ). The techniques that allow us to do 
this are taken from iFangl (|2012l ) and will yield optimal rates of convergence 
in some applications. Secon dly, the assumptions made on the dependence 
bv iRinott and Rotarl (|l99fih do not cover the applications we will discuss 
here. Instead, we will use a multivariate generalisation of Stein couplings to 
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achie ve the necessary generality. Stein couplings, introduced bv Chen and 
Rollin ((2010^" capture the minimal structural assumption necessary to use 



Stein's method for normal approximation. 

We will also keep the dependence of the constants on the dimensionality 
explicit and as small as possible without blowing up the proofs, but we do 
not pursue optimality in that respect. 

The remainder of this article is organised as follows. In Section [5] we will 
state our main abstract theorem, but we will postpone the (rather technical) 
proof to Section [H In Section El we will discuss two main applications, one 
involving permutation statistics and the other a new test for heterogeneity 
in dense graphs. In Section [5j we will present some standard multivariate 
Stein couplings for reference. 

2 MAIN RESULTS 
Stein couplings were introduced by Chen and Rollin ( 20ld ) in order to unify 



many of the approaches developed around Stein's method for normal ap- 
proximation, such as local approa ch, size biasing and exch angeable pairs, to 
name but a few. In the spirit of IChen and Rollin (l201(t l we give a multi- 
variate definition of Stein couplings. 

Definition 2.1. A triple of square integrable d-dimensional random vectors 
(W, W', G) is called a d- dimensional Stein coupling if 

m{G t F(W') - G t F{W)} = E{IT*F(IT)} (2.1) 

for all F : R rf — > 1R, d for which the expectations exist. 

Remark 2.2. By choosing F(w) = e^, where is the ith unit vector, 
it follows from ()2.ip that ElTj = 0. Therefore, 1EW = is a necessary 
condition for a Stein coupling. Choosing F(w) = Wjei, it follows that 

M{G{W - Wf) = Cov(W). 

Throughout this article, |x| denotes the Euclidean norm of x £ R d , and 
Id denotes the d-dimensional identity matrix. With this, we can formulate 
our main result. 

Theorem 2.1. Let (W, W',G) be a d- dimensional Stein coupling. Assume 
that Cov(P^) = ILj. With D = W' — W , suppose that there are positive 
constants a and (3 such that 

\D\^f3. (2.2) 

Then there is a universal constant C such that 
d c (Jt?(W),J?(Z)) 

^ c{d 7 ^anD\ 2 + d l l^ + d 7 l*a l / 2 B\ 12 + d?'*B 2 + d^B^ 2 ) , (2 ' 3) 
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where Z is a d- dimensional standard Gaussian random vector and 



Bi = \JVarTEW\D\ 2 , B 2 = ^ y Var E, w (GiDj), 

d I 

B z = ^^ W {G l D J D k ). 

i,j,k=l 

As usual, we can upper bound Var E w (•) by Var E J '(-) for any <7-algebra 
J- D cr(W). This is a standard trick in Stein's method and will be used in 
the applications without further mention. 

Note that, if (W, W , G) is a (i-dimensional Stein coupling and Ais amxd 
matrix, m ^ 1, then (AW, AW', AG) is an m-dimensional Stein coupling. 
In this light, assuming that Cov(W) = Id is a matter of convenience rather 
than a real restriction. Note however, that the dependence on the dimension 
will be affected. If A is a d x d matrix, denote by ||^4||2 its operator norm 
with respect to the Euclidean norm and denote by ||-A||oo its element- wise 
supremum norm. Noticing that d c is invariant under linear transformations, 
we have the following immediate consequence of Theorem 12.11 



Corollary 2.2. Under the conditions of Theorem \2.1\ but now allowing 
Cov(W) = E for any positive definite E, there is a constant depending 
only on the dimension such that 

d c (^(^ l/2 W),^(Z)) = d c {^{W),^{T}/ 2 Z)) 

^ C d ( as mD\ 2 + s 2 p + Soo a^ 2 Bl /2 + + s% 2 Bl /2 ) 



where s 2 = ||E 1 / 2 ||2 J Sqo = ||E 1 / 2 || o an d 



d 



B 4 = J2 yVarE^Z) 2 . 

i=l 

Note that the corollary cannot be expected to be informative if E is 
singular or close to singular. In particular, the Wi need to be standardized 
so that Var Wi, 1 ^ i ^ d, are all of the same order. 

Remark 2.3. If (W, W') is an exchangeable pair of (i-dimensional vectors 
and 

m w (W -W) = -AW 

for some i nvertible d x ri-matrix A, th en (W, W' , ^A~ 1 (W' — W)) is a Stein 
coupling. Reinert and Rollin ( 20091 ) showed that, under the very special 



condition that 

M W (W' -W) = -XW (2.5) 
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for some < A < 1, one only needs that Jf(W) = Jf(W') without exchange- 
ability for their results to hold. The situation for our Theorem 12.11 is the 
same. If W and W 1 have the same marginal distribution and satisfy (|2.5|) . 
then ([231) holds if Cov(W) = l d (respectively ([23]) holds if Cov(W) = S) 
with G = jA-^W - W). A sketch of the proof will be given in Section [H 



3 APPLICATIONS 

3.1 A confidence interval for dense homogeneous random graphs 

A basic problem in the analysis of graphs is to test whether a given graph of 
size n is compatible with the assumption that it is a realisation of an Erdos- 
Renyi random graph G(n,p) with edge probability p. Many test statistics 
have been p roposed, such as diameter, maximal degree, number of triangles, 
etc.; see e.g. Pao et all ( 201ll ). Here we propose and justify a new test that is 
based on the theory of dense graph limits. We will only int r oduce the p arts 
of the theory necessary for our ap plication; see iBorgs et al.l (120081 . |2012j) for 
a thorough introduction; see also Diaconis and Janson ( 20081 ). who make a 
connection with earlier work of Aldousl (| 198 it ). 

Denote by end(i ? , G) the set of injective graph homomorphisms from F 
into G. If F has k vertices, then it is clear that | end(i ? , G)\ ^ (n)k '■= 
n{n — 1) • • • (n — k + 1). Define 



t(F,G) 



end(F,G)| 
(n)k 



the "density of F in G". Let (G n ) be a sequence of graphs (where n ^ 
no for some unspecified no) and for convenience assume that G n has n 
vertices. The sequence is called a dense graph sequence if the number of 
edges is of order n 2 . It is clear that if the sequence (G n ) is not dense then 
lim, woo t(F, G n ) = for any F containing at least one edge. 

Let now k : [0, l] 2 — > [0, 1] be a measurable, symmetric function; we 
shall call any such function a standard kernel. For any finite graph F with 
k vertices, let us define the "density of F in k" as 



t(F, K ) 



K,(xi, Xj) dxi ■ ■ ■ dxk, 

{ij}cE(F) 



where E(F) denotes the edge set of graph F. 

The corner stone of dense graph theory is that if t(F, G n ) converges for 
every F, then there is a standard kernel k such that limt(i ? , G n ) = t(F, k). 
We can therefore say that k is the limit of the graph se quence (G n ). Note , 
howe ver, that k is usually not unique; we refer again to lBorgs et al. (l200Sl . 
2012J) on how to characterise the equivalence classes of standard kernels 
representing the same graph limits. 
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Let now K„ be the comp l ete g raph of size n, and let C n be the cycle 
graph of size n. IChung et al.l ( 1983 ) proved the surprising result that if 



t(K 2 ,G n )^p and t(C 4 ,G n )^p 4 



for some < p ^ 1, then 

t{F,G n )^p" 

for any graph F where e(-F) is the number of edges in F. That is, in the 
dense case, the limiting densities of edges and 4-cycles determine whether 
the limit is an (infinite) homogeneous Erdos-Renyi random graph. In other 
p is the only s t andar d kernel with t(K2, k) = p and t(C4, k) = p 



words, k 
sec 



l. 



Lovasz and Szegedvl ( 2011 ) for generalisations of these findings. This 



suggests that we may use the number of edges and 4-cycles in a finite graph 
as test statistics. 

However, some care is needed. It is known that for a homogenous Erdos- 
Renyi random graph, the number o f edges and the number of 4-cycles are 
linearly dependent in the limit; see iJanson and Nowickil (|199ll ). Thus, we 
cannot use th ese values directly to const ruct our test. 

Following I Janson and Nowickil (|199ll ). we can instead consider the den- 
sity of 4-cycles corrected by the edge density (this is essentially the first 
non- leading term in a Hoeffding-type decomposition for the 4-cycle count). 
To this end, if G n is a given graph of size n, define the two statistics 



T\{G n ) 



end(K 2 ,G n ) 



T 2 (G n ] 



end(C4, G Tl 



The factors 2 and 8, respectively, are the sizes of the automorphism groups 
of K 2 and C4, respectively. Therefore, T\ is just the number of edges in G n 
and T 2 is the number of 4-cycles in G n . Define now the normalised edge 
counts 



T\(G n ) 



with 



0"! 



p(l-p), 



and the corrected and normalised 4-cycle count 



W 2 (p,G n ) = 



cr 2 



with 



(Tr, 



3, » ] ( - 4p» + 3^) + (4(„ - 4) + 2)^(1 - 2p + ^) 



n (4) (n - 3) 



p 6 +p 8 



2p 7 



+ n 



(4)" 



+p 8 



2p b 



Note that Barbour et al. I (Il989h use Stein's method to prove univariate 
normal approximations of subgraph counts and related statistics. However, 



5 



for statistics such as W2, they resort to the method of moments. The rea- 
son that W2 is more difficult to handle is that it is a degenerate incomplete 
[/-statistic of the edges (see ()3.2[) below), whereas simple subgraph counts 
are non-degenerate. Corre sponding multivariate results where obtained by 
Janson and Nowicki in great generality using Hoeffding-type decom- 



positions and the methods of moments. For the degenerate statistics they 
state that "Stein's method does not seem to work in that case". Our next 
result shows that it is nevertheless possible. 

Theorem 3.1. Let G n be a realisation of an Erdos-Renyi random graph on 
n vertices and edge probability p. Let W = (Wi(p,G n ),W2(p,G n )) and let 
Z be a standard bi-variate normal random variable. There is a universal 
constant C intependent of p and n such that 

d c (J?(W),J?(Z)) < 



p 9 (l — p) 3 y / n 

Theorem 13.11 justifies the following procedure to construct a confidence 
set for the family of Erdos-Renyi random graphs. Let G n be a simple, 
unlabeled graph of size n. For some < a < 1, define the 1 — a confidence 
set as 

Ci- a {G n ) = {0 < p < 1 : W?(p, G n ) + Wi(p, G n ) ^ 9l _ a }, 

where q±- a is the 1 — a quantile of the x 2 -distribution with 2 degrees of 
freedom. Ci- a (G n ) is the set of those p for which G n is compatible with the 
model G(n,p). If Ci_ a (G n ) is empty, then G n is not compatible with any 
homogeneous Erdos-Renyi random graph model (at significance level a). 

Corollary 3.2. For any given < p\ < p u < 1, 

Pb ^ Ci-a(G(n,p))] = a + 0{n^ 2 ) 

uniformly in p G [pz,Pu] as n ^ 00. Furthermore, for any non-constant 
standard kernel k, 

P[Ci_ a (G(n,«))=0]->l (3.1) 

as n —> 00. 

Note that the secon d part of the coroll ary follows from standard concen- 



tration results; see e.g. iBorgs et al.l (200 



Remark 3.1. Note that (|3.ip essentially says that the test will eventually 
detect any non- homogeneity as n — > 00. This is no lon g er tru e if 4-cycles 



were to be replaced by triangles; see e.g. IChung et al.l (|1989i . p. 361) for 
an example where the edge and triangle densities of a heterogeneous Erdos- 
Renyi random graph is indistinguishable from that of a homogeneous one. 
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Remark 3.2. If one is interested in testing 

Hq : G n ~ G(n,p) for some < p < 1 

against 

H\ : G n ~ G(n, k) with k ^ p for all < p < 1, 
then one can consider the test 

<p(G n ) = l[d- a (G n ) =0] 

= I [ inf (Wftp, G n ) + W|( P) G n )) > qi_ a 

L0<p<l 

Note, however, that this test will have significance level less than a and 
can therefore only be considered as a conservative test for homogeneity of a 
dense graph. 

Before we prove Theorem 13. II we need some technical lemmas. For each 
k, I) define 

Vijkl = hjljkhlhl ~ P 3 {lij + Ijk + hi + hi) + 3p 4 , 

where Iij = Iji is the indicator of the event that there is an edge connecting 
i and j. It is straightforward to verify that 

HmwVuv) = (3.2) 
for any (i,j,k,l) and any (u,v). From (|3.2|) . we have the following lemma. 
Lemma 3.3. For any (i,j,k,l) and (u,v,w,m) we have 

^(Vijklluv) = 

and, if k, 1} D {u, v, w, m}\ ^ 2, 

^{Vijkiriuvwm) = 0. (3.3) 

In what follows we will always assume that fc-tuples («!,...,%) G {1, . . . , n} k 
are ordered as ii < %2 < • • • < ik- With i = k, I), define 

= Iij + hk + hi + hk + Iji + hi — 6p, 

X2,l = Vijkl + flijlk + Vikjl, 

and X L = (Xi tl , X2 tl ) t ■ Now we can represent W as a sum of locally depen- 
dent random vectors, namely 
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where the sum ranges over all t = (i,j,k,l), again with i < j < k < I. 
To see that (|3.4p is the same as in Theorem 13.11 note that each ordered 
tuple (i, j, k, I) corresponds uniquely to three possible 4-cycles. Furthermore, 
in the first sum in (|3.4p . each edge Lij is over-counted (™2~ 2 ) times. It is 
straightforward to check that 

MW = 0, E(W') = I 2 . 

Since X L is independent of X K if i and k share at most one vertex, we can 
choose, for each t, the neighbourhood A L := {k : \k n l\ ^ 2} and we have 
that, for given t, the collection {X k ) k ^_a l is independent of X L . Therefore, if 
/ is uniformly distributed over all t = k, I) with i < j < k < I, 

(W,W',G) := (w,W- ^'-(4)^) ( 3 ' 5 ) 

is a Stein coupling (c.f. Section [5]). 

Proof of Theorem \3.1\ We apply Theorem 12.11 for the Stein coupling (|3.5p . 
Let as usual D = W' — W. In what follows, C denotes a positive constant 
independent of p and n, possibly different from line to line. Note first that 

a\ > Cn 2 p{\ -p), a\> Cn 5 p 6 (l - pf . 

Hence, 



n 2 cri (T2/ n 5 / 2 p 3 (l—p) 
and |-AJ ^ Cn 2 , and we obtain the upper bounds 

Cn 3 / 2 C 

The second moment of \D\ can be calculated as follows. Noting that |/erW| ^ 
2 implies lE(X2 tK X2 jK ') = 0, which follows from (|3.3p . we have 

E|L>| 2 = E1D 2 + EZ)| 

< Cn 2 x n 2 x + Cn 2 x n x \ < 



n 4 <7 2 cr 2 ^ n 2 p 6 (l — p) 2 ' 



Define the cr-fleld T = a(G n ). Clearly, T D a(W). In the following, we 
calculate the variances of the conditional expectations in the bound (|2.3p . 
The key ingredient is Lemma 13.31 
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First, 



Cra 10 

= E E Cov(X 1)t X 1)K ,X lit ,X lA ,) ^ — T , 
z — ' n 8 o"i 

where the last inequality is because Cov(Xi jt Xi jft , Xi^X\^ K >) = if (l,k) 
and (</, k') share at most one vertex. By the same argument, 

Cn 10 

Var(E^GiL>2) < ]T 2 Cov (^i ^i/^2 A ^ 

and 

rv> 10 



furthermore, from (|3,3p . 



r?? 9 

Var(E^G 2J D 2 ) «S — j-. 



Therefore, 



n 1 / 2 p 6 (l — p) 2 
By similar arguments, 

VarE^D 2 ) < Var E^D 2 ) < 

and, hence, 

n 5 / 2 p 6 (l — p) 2 
Finally, using Lemma 13.31 we obtain 



n iZ a° n^afa^ 

Cn 13 Cn 14: 
VarE^G^ 2 )^^^, Var E^(G 2 D 2 ) < -g-^, 

(7 n i3 Cn 13 
VarE^CGa^iDzX-r-TT. Var E^G,!) 2 ) < — g-, 

and therefore 

C 

B 3 



np 9 (l — p) 3 

Collecting the bounds on B\, i? 2 and -B3, in combination with (|3.6|) and (|3.7p . 
yields the final estimate via (|2.3p . □ 
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3.2 Joint normality of certain permutation statistics 

Let M be a real n x n matrices and assume that M is anti-symmetric, that 
is, for each i ^ j we have 



M, 



-Mi. 



Let 7r be a permutation of size n, chosen uniformly at random, and consider 
the statistic 

W = J2 M n{i)nU)- ( 3 - 8 ) 

i<j 



This type of permutation statistics was considered by iFulmanl (|2004l ) and it 
is a special case of doubly-indexed permutation statistics 



^2a(i,j,ir(i),ir(j)) 



(3.9) 



with 



a(i,j,k,l) = l[i < j]M kt . 

The reason to study (j3.8[) is that two important properties of permutations, 
the number of descents and inversions, can be readily represented in this 
form. Choosing M^j+i = —1 and My = for all other j > i (for j < i, Mij 
is defined via anti-symmetry), (|3.8p becomes 2Des(7r _1 ) — (n — 1), where 
Des(7r) is the number of descents of tt, and with M^ = —1 for all i < j, 
(|3.8p becomes 2Inv(7r _1 ) — [T), where Inv(7r) is the number of inversions 

of TT. 

Using Stein's method, Izhao et"aL ( 19971 ) prove a general Berry-Esseen 
type theorem for sums of the form (|3.9p . but their results do not apply to 
the number of descents Des(7r), wh ich seem s to be "too sparse". In contrast, 
using a special exchangeable pair, IFulmanl (|2004l ) was able to obtain a rate 
of converegence of ra -1 / 2 for the Kolmogorov metric for both, the number of 
descents and inversions. 

We shall extend Fulman's results to the multivariate setting. Further- 
more, we are able to remove a certain conditon on M (present in Fulman's 
work), arising from the requirement of exchangeability; c.f. Remark 12.31 

Let M 1 , . . . , M d be a sequence of real n x n matrices and assume that 
each matrix is anti-symmetric. For each r, define W r 
in 



Fulmanl (|200J), define 



A'; 



j>i 



j<i 



The mean and covariances of W 
lemma. 



(Wi, . . . , Wd) are given in the following 



Lemma 3.4. We have EW = and 



Cov(W r , W s ) = l(j2 M£M£ + - WW ~ Bt)) 

^ i<j i 
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Proof. The proof is similar to that of iFulmanl (|2004L Lemma 4.3.1). For 
each r define 

\MT p ifvr- 1 «<vr- 1 (i), 
[MJ i: ii7r- 1 (j)<^- 1 (i)- 



Then 



^ = E%r E K = E X ^)- 

j<j 7r-l(j) i<j 

<7r-l(j) 

Since MX^ = 0, we have EW r = 0. To avoid over-use of brackets in 
what follows, expressions such as "EXfj(7r)X?j(7r)" are to be understood 
as "^{Xlj^Xf^ir)}" . Recalling that M r is anti-symmetric, Cov(W r ,W 8 ) 
can be calculated as 

Cav(Wr,W 8 ) 

= X>*I»*I» 

i<j 

E E Ejq>)x&oo+x; E e^w^-w 

i jj^l:j,l>i j iy^k:i,k<j 

+ £ EX^(vr)^(vr)+ £ EX^M^to 

i<j<l k<i<j 

= E M lj M i,j + E E -t^ • E E -V 1 

i<j i jj^l:j,l>i j ij^k:i,k<j 

^3 ^3 

i<j<l k<i<j 



= x: m^.m^. + x: i(A r 4 s - e M ii M ?j) 

i<j i j.j -i 

+ x: - x: m^.m^.) - 1 x] b ia s - 

i jy<i i 

= ^E M lj M ij + 1 E(^ - B ^ A ° - B t)- u 

i<j i 

Without loss of generality, in the following we assume that Var(W r ) = 1 
for each r. With W = (W±, . . . , WdY, we have the following result. 

Theorem 3.5. Let be W as above and let 

n 

P = sup X3 |A^| ■ (3.10) 

Then, with £ = Cov(VF), there is a positive constant depending only 
on d, such that 

d c (^(W),^(^ 2 Z)) ^ CadlS- 1 / 2 !!! + ||S- 1 / 2 ||^)n^ 3 . 
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As a corollary of Theorem 13.51 we prove the joint asymptotic normality 
of the number of descents and inversions of tt; the rate obtained is best 
possible. 

Corollary 3.6. Let Des(7r) and Inv(-7r) be the number of descents and in- 
versions of it, and let 

/Des(vr)-^ Inv(vr) - V 
W = (W 1 ,W 2 ) t = 1 ' 2 ' 2{2j 

\ xj 1 ^ / n(n-l)(2n+5) J 

Then 

d c (JZ>(W),J?(Z))^-^ 
\/n 



for some absolute constant C , where Z is a 2- dimensional standard Gaussian 
vector. 



Proof. Set 



M- 1 



3 

x < 



n + 1 



and set 



-1 ifj = z + l, 
+1 if i = z - 1, 
otherwise, 

-1 if j>i, 



M ti = WF^ ^ x < +1 if j < i, 

3 y n(n - l)(2n + 5) | 

otherwise. 

It can be easily verified that W x = J2i<j M l(i)n(j) and W ? = £i<j M l(^{j)- 
From Lemma E31 Var(J^i) = Var(W 2 ) = 1 and |Cov(Wi,W 2 )| ^ C/n. 
Moreover, f3 as denned in ()3. 10|) is smaller than C/y/n. Therefore, the 
corollary is proved by applying Theorem 13.51 □ 

To prove Theorem 13.51 we need the following lemma, the proof of which 
is straightforward and therefore omitted. 

Lemma 3.7. For 1 ^ r, s,t ^ d and (3 defined in ()3. 10|) . we have 

]T \Mr ii2 M? l!i3 Mr ii5 Mt ii6 \ < n 2 /3\ (3.11) 

ii,...,«6 

E \M[ li2 M? li3 M[ 4i5 Mt 4i6 \ < n(3\ (3.12) 

{ii,...,i 6 }|<5 

E \ M UM^ M \^M^ M liM^\ < ™ 2 /^ ( 3 - 13 ) 

ii,...,«8 

E \ M U^iM,iMsiMw M U*\ < ^ 6 > ( 3 - 14 ) 

|{u,...,i 8 }|<7 

where ^,\u x ij.}|<fc-i stands for summation over all tuples (ii, . . . ,ik) for 
which at least two components are equal. 
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Proof of Theorem 13.51 We adopt the construction of W' from Fulman ( 20041 ) . 
Let I be uniformly chosen from {1, . . . ,n} and independently of tt. Given 
/, we define n' as tt o (I, / + 1, . . . , n) where (1,1 + 1, . . . , n) denotes the 
mapping I \— > I + 1 >...*— )■ n )•/, while keeping the rest identical. As tt 
and tt' both are uniformly distributed, W and W' have the s ame marginal 
distribution (but are not necessarily exchangeable). iFulmanl (|2004l ) showed 
that with A = 2/n 

W(W' -W) = -XW. 

Following Remark the bound holds with D = W - W and G = 
i^\~ l D (c.f. Section [5]). From the construction of W' and the definition of 
j3 in (|3.10p . we have 

We first prove that 



\G\ < C d n(3, \D\< C d p. 

c d p A 



YwW(D r D a ) 



n 



(3.15) 
(3.16) 



From the construction of W, 
Var1E T (D r D s ) 



(4 n \ 
"EE K(> )M n) M k>UUi)) 
i=l ji,32>i 



16 



MEE M WoO M W(i)+E E Kn,«W M kQ,*(* 

i=l j>i 



i=l 31,J2> i 



As the first double sum in the last line is constant, we only need to show 
that, for each i, 

E Var(M; Wi7r0i) M^ )i7r02) )^C d n/3 4 
«<ii<j2 



and 



(3.17) 



!<Jl<i2. fc < ! l< i 2 

(*>ii>i2)9 4 ( fc > I i> l a) 



From (13TT21) . 



i<h<h 



i<ji<j2 



i<h<h \{k,l,m}\=3 

< C>/3 4 . 
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To prove (|3,17p . we consider the following three cases where \{i, j%, j 2 , k, h, h}\ 
is either 4, 5 or 6. We will, for the remainder of this proof, use the simplified 
notation 

E - E 



i<il<i2. fe < i l< ! 2 
|{i,Jl,J2.'=. ! l. i 2}l= 4 



and analogously for the cases 5 and 6. Let tt be an independent copy of tt. 

^ COV ( M ^i),7T(j 1 ) M w(i),7T(j 2 )^ M n(k),7T(h) M w(k),7T(h)) 

4 

^ ^l EM ^«^(ii) M ^(i),7r0 2 ) M ^(fc),7r(Zi) M ^(fc),7r(« 2 ) I 
4 

+ | ^ M l(i) ,7T (.71 ) M ^(i) ,7T(j 2 ) M *(fc) ,7f (It ) M l(fc) ,# (fa ) I 



where in the second inequality, we used (|3.1ip and (|3.12p . Similarly, 



Yl Coy ( M ki)Mh) M ki)rth)> M kk)Mh) M kk),Ah)) 



Lastly, 



Yl Cov ( M ki)Mh) M k^Mh)> M kk)Mh) M kk)Mh)) 
Y^ M ki)Mh) M ki)Mh) M kk)Mh) M kk),Ah) 

6 

x (l-P[|{7r(i),7r(ii),7r(i2),7f(fc),7f(Zi),7f(Z 2 )}|=6]) 



+ 



(h) 



x l[|{7r(i),7r(i 1 ),7r(i 2 ),7r(fc),7f(/i),7f(Z 2 )}| < 5] 



^ C d nP 4 



where we used (|3. 1 1[) and ()3.12p in the last step. Therefore, we have proved 
(|3.17p . and thus ()3. 16|) . By the same argument as in proving (|3.16p . and 
using the bounds (|3.13p and ()3.14p , we can prove 



VarE^ZEAA) ^ 



C d f3 6 



(3.18) 



Applying the bounds (|3.15p . (j3. 16|) and (j3. 18|) in ()2.4p and observing that 
(3 ^ Cdn/3 3 proves the theorem. □ 
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4 PROOF OF MAIN THEOREM 
For given test function h, we consider the Stein equation 

Af(w)-w t Vf{w) = h(w)-~Eh(Z), w 6 R d , 



(4.1) 



where A denotes the Laplacian operator and V the gradient operator. If h 
is not continuous (like the indicator function of a convex set), then / is not 
smooth enough to apply Taylor expansion to the necessary degree, so more 
refined techniques are necessary. 

We follow the smoothing technique of iBentkusI fcooah . Recall that A is 
the collection of all convex sets in R a! . For A G A, let h A (x) = Ia{x), and 
define the smoothed function 



h A , £ (w) = tp 

where d(w, A) = mf v€A \ w — v\ and 



d(w,A) 



(4.2) 



tp(x) 



1. 

1 - 
2(1 
0, 



2x 2 , 



x < 0, 

< x < i, 
\ ^ x < 1, 

1 < x. 



(4.3) 



A~ e = {x £ A : d(x,m*\A) > e} 



Define also 

A £ = {x E R d : d(s,A) < e}, 
(note that in general (A~ £ ) £ ^ A). 

Lemma 4.1 dBentkueJ (|2003h ). The function Jia,e as defined above has the 
following properties: 



(»■) 


hA,e(w) = 1 for all w £ A, 


(4.4) 


(it) 


h A ,e(w) = for all w £H d \ A £ , 


(4.5) 


(Hi) 


< h Ai£ (w) ^ 1 for all we A £ \A, 


(4.6) 


(iv) 


\Vh A , £ (w)\ ^ 1e~ x for all w € R d , 


(4.7) 


(v) 


\Vh A)£ (v) - Vh Ae (w)\ ^ 8\v - w\e~ 2 for all v,w e R d . 


(4.8) 



Lemma 4.2. For any d- dimensional random vector W , 

d c (^(w),^{z)) ^ 4g ,1/4 £ + sup\m Ae (w) -m As (z)\. 



(4.9) 
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Proof. A standard argument yields that, for any e > 0, 

d c (s?(w),s?(z)) < sup \m A , s (w) - m A>e (z)\ 

+ sup max{P(Z G A £ \ A), P(Z G A \ A~ £ )}. 
AeA 

Fromliaii (jl993h and IBentkusI l|200ah we have 

sup max{P(Z 6i £ \ A), P(Z 6 A \ yT £ )} ^ 4d 1/4 e. (4.10) 

(the dependence on cf in ()4.9p is optimal; see IBentkusI <|200ah ). □ 
Fix now e and a convex A c R d . It can be verified directly that 

= -\! -r— I [h A ,e(VT zr ^w + ^z)-'Eh A , e (Z))(p(z)dzds 

is a solution to (|4.ip . where </? is the density function of the d-dimensional 
standard normal distribution. In what follows, we keep the dependence on 
A and e implicit and write / = f A>£ . For real- valued functions on 1R, d we 
will write fi(x) for df(x)/dxi, fij(x) for d 2 f(x)/(dxidxj) and so forth. 
Using this notation, we have for 1 ^ i,j ^ d that 

fij(w) = -- [ - [ h(Vl - sw + yfsz)(pij(z)dzds 

2 J * S Jud (4.11) 



+ - j —= \ hj(Vl ~ SW + yfs 
^ JO V s JTR, d 



sz)(fi{z)dzds. 



Proof of Theorem \2.1l Fix A £ A and g > (to be chosen later) and let 
/ = /a,e be the solution to the Stein equation (|4.1|> with respect to h = h A)£ 
as defined by (|4T2]> . Let 

«:=d c (^(W),=5f(^))- 
Adding and subtracting the corresponding terms, we have 

E{A/(W)- W*V/(W)} 

= E{G*V/(W r/ ) - G*V/(W) - W l Vf(W)} 

d 

■ Y ®{&j-G t D ] )f lj (W)} 

f d d d 

-W,\YGifi{W')-Y G ih(W)- Y G i D if*(W)\ 

M=l 8=1 t,J = l ^ 

= : iin + Rl ~ R2- 
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As (W, W, G) is a Stein coupling, clearly R = 0. Using (|4.1ip . and the fact 
that Jjn d (fij(z)dz = 0, we have 



i 1.7 — 1 



x 



[h(Vl - sW + yfsz) - h(Vl - sW)]<pij(z)dzds 
+ £ E f iTr I [ E ^(<% - G i D j ))h j (VT^W + y/az)tpi(z)dzd8. 

Form the definition of K and the concentration inequality of the standard 
<i-dimensional Gaussian distribution (c.f. (|4.1U|) ). we have 



M{h(VT^sW + ^~sz) - h(VT^sW) } 
^1E{l[d(VT^W,A £ \A) < Vs\z\]} 

< Ad 1 /' ( 



+ 2 



1 - s' 



z\ + 2k. 



Using Cauchy-Schwartz's inequality, the bound (|4.7p . the simple inequality 
y/ai + a 2 + a 3 ^ v / "i+\/"2+v / "3 for ai,a 2 ,a 3 > 0, and / Rd |z| 1//2 |v?ij(z)|dz S 
Cd 1 / 4 , we have 



|2?i| < ^2 yVarE w (G i L> i 



1/2 



'^1 + — ) \fij(z)\dzds + 2 



s Jwi d V y/l — s \j I — s 

< ^(d 1 /^ 1 / 2 ! loge| + d 3 / 8 + ^/2| log£ |). 

In order to estimate R2, let U and V be independent random variable dis- 
tributed uniformly on [0, 1]. Using the integration by parts formula, 

* = £ B I (-5) L + UB > + ^) 

— h{Vl — sW + y/sz)]GiDjipij(z)dzds 

+E E f [hj{VT^~s(W + UD) + V^z) 
~~i Jo l \ s Jn d 

— hj(y/l — sW + \fsz)]GiDj(pi(z)dzds 



E / h(VT^~sW + ^z + VT^~sUVD) 
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x UGiDjDk(fijk{z)dzds 

d r-e 2 R [■ 

+ H E / / d h jk (VT^W + J7sz + y/T^lUVD) 

x UGiDjDf c (fi(z)dzds 

= : i?2,l + #2,2- 

We can rewrite i?2,i as 

- - sW + y/sz)]UGiDjD k ip ijk {z)dzds 
x [/-[E^CGi-DjAk) - E(G iJ D iJ D fc )]^ fc (^)^ds 

«,J,K = 1 

- /i(Vl - + Vsz)]C/E(G i L> i L>fc)99i i fc(^)^s 
x U'E(GiDjDj ; .)<fijk(z)dzds 

= '■ #2,1,1 + #2,1,2 + #2,1,3 + #2,1,4- 

Now, it is straightforward to verify that for any it, v,w, z G R d 
d 

UiVjWkipijkiz) 

i,j,k=l <• ' ' 

= — U*Z 1> 2 U>* Z </j(z) + (« V W 2 + U*?!) V 1 Z + U «J U t z)(f(z) . 

From ()4.12|) and the boundedness condition (|2.2p . 
Ii22.ul < Ej£ J ud l[d{VT^W + ^~sz,A e \ A) < VT^P] 

d 

^ GiDjD k ip ijk (z) dzds 

; a 7„ — i 



s: a 



£ ^37? ^ d P [<*(VT=7W + Viz, A £ \A)^ VT^P] 
x E|D| 2 (3|z| + |z| 3 )v9(z)(iz(is 
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x (E W \D\ 2 - E|D| 2 )}(3|z| + \zf)<p{z)dzds 



< Cd^aEl^pe- 1 (k + d 1/4 (/3 + e)) + Cd^V^Var M W \D\ 2 . 
From the fact that sup 4 J Rti ^ C, 



i,j,k=l 

From (grigD , 

|i?2,i,3|<C(/^- 1 +d 1 / 4 )E(|G|| J D| 2 ). 
For i?2,i,4) applying the integration by parts formula, 



i? 2 ,i,4= V E / ^ 5 / /»i ifc (vT^Z + ^) 

^ Je 2 2 ^ 

x UTEiGiDjD^tp^dzds 

y, r 1 -V^~s r h ^ um ( G . D Dk) ^ z ) dzds 

Til, Je 2 2 JW 1 



i,j,k=l ' 



= J2 I I Hz)U^{GiDiD k )y ijk {z)dzds 

iM^xJ* 2 J^ d 

where we pretended that the third partial derivatives of h exist. This is not 
a problem because we can first smooth h to have third derivatives then take 
the limit. Now with (jUZD , 

|i? 2)M | <CE(|G||£>| 2 ). 
From g3]) and | £? =1 G^i(X)| < "M^tXh 



x 1^1^ 



i=l 



dzds 



x E|£>| 2 |,z|y?(,z)ofeds 
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E JO 43 I Ji&d 

x (E^IZ?! 2 - E\D\ 2 )}\z\<p(z)dzds 
< Cd 1 / 2 ^ ^E|D| 2 (k^ 1 + d 1 / 4 /^ 1 + d 1 / 4 ) + e" 1 ^Var E^D) 2 ^ . 

Therefore, 

|i? 2 | < c(d^ 2 dE\D\ 2 e- 1 {n + d 1 ^^ + e)) + d 3/2 e~ 1 aB 1 + e" 1 ^). 



Collecting the bounds and using the smoothing inequality (|4,9p . we obtain 
the following recursive inequality 

k < C{d 3/2 aE|D| V 1 (k + d 1/4 (/3 + e)) + d 3/2 e~ 1 aB 1 

(4 13) 

+ + 5 2 ((i 3/8 + (i 1/8 e 1/2 | loge| + loge|) } + M^e. 1 ' 



Let 



e = 2Cd 3 / 2 alE\D\ 2 + /5 + c^V/ 2 ^ 2 + ^i/s^ + d -i/ 8f? i/2 



with the same constant C as in (|4,13p . The theorem is proved by solving 
the recursive inequality for k and observing that as long as e is smaller than 
an absolute constant, e 1 / 2 1 log e| ^ C and K 1 / 2 |loge| ^ Cd 1 ^. □ 

Sketch of the proof for Remark \2.3l Let U and V be uniform on [0, 1], inde- 
pendent of each other and all else. Under the conditions of Remark I2.3| we 
have from Taylor expansion that 

= \- 1 ®{f(W')-f(W)} 

d d 

= A- 1 E^(1U/ - Wi)fi{W) + A~ X E I ])j[ W + UVD) 

i=l *J=1 
d d 

= -^Y^WihiW) +E GiDjhjiW) 

i=l i,j=l 
d 

UdD^f^W) - fij(W + UVD)). 

»J=1 

Therefore, 

E{A/(TU) -W l Vf{W)} 

d 

= ^{(Sij-GiD^fijiW)} 
d 

+ 2E UG i D j (f ij (W) - fij(W + UVD)) =: R[ - R' 2 . 

»J=i 
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The quantity is the same as -Ri in the proof of Theorem 12.11 The quan- 
tity R' 2 contains an additional integration step as compared to R2 of The- 
orem I2.lt but can be bounded in very much the same way (up to different 
constants) . □ 

5 SOME STEIN COUPLINGS 

In this section, we describe some known coupling constructions as multivari- 
ate Stein couplings for reference. 

5.1 Multivariate exchangeable pairs 

Chatterjee and Meckes ((20081 ) and lReinert and Bdllinl (fcOQfll ) introduced the 



exchangeable pairs method for random vectors. These are particular Stein 
couplings as we will now show. Assume that (W, W') is an exchangeable 
pair of d-dimensional random vectors such that 

E W (W -W) = -AW (5.1) 

for some invertible (d x d)-matrix A. It is straightforward to check that 

(W, W, G) = (W, W, ^A^iW - W)) 

is a Stein coupling. 

Assume Var(W) = S is positive definite. Let £ 1//2 be the unique positive- 
definite r oot of E, and let S" 1 / 2 i ts corresponding unique inverse. It was 
shown by Reinert and Rollin ( 20091 ) that exchangeability of (W, W ! ) implies 



symmetry of A = X 1 / 2 AS 1 / 2 . Let therefore O be an orthonormal matrix 
and let L be a positive diagonal matrix such that A = OLO 1 . Define W = 
0£-i/2w w' = OYT^W. It follows from (O) that 



TE W (W' -W) = -LW. 

We could therefore — in principle — restrict ourselves to (W, W) that are un- 
correlated with (|5.ip being true for diagonal A. However, it is in practice 
often much easier to work with the unstandardized W as S" 1 / 2 and O are 
typically difficult calculate. 

5.2 Multivariate size bias couplings 

This coupling was considered by Goldstein and Rinott ( 19961 ). Let Y b 



e 



a non-negative d-dimensional random vector with mean \i and covariance 
matrix E. For each % = {1, . . . , d}, let Y 1 be defined in the same probability 
space as Y and have y-size biased distribution in direction i, i.e., 

E{y,/(F)} = / u. i E/(y 4 ) 
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for all functions / such that the expectations exist. Let K be uniformly 
distributed over {1, 2,...,d} and let ex be the ci-dimensional unit vector in 
direction K. Then 

(W, W, G) = (Y- (i, Y K - n, fi K e K ) 

is a Stein coupling. 



5.3 Local dependence 



A refi ned version of this dependence was considered by iRinott and Rotar / 
dl996h . Let (Xi) i(zx be a collection of centered d-dimensional random vectors 
for some finite index set X. For each i G X, assume there is a set A{ C X such 
that Xi is independent of (Xj)i & A?- Let I be uniformly distributed on X. 
Then 

(W,W',G)=(Y / X i , Xi,-nX r \ 
Hex iex\Ai ' 

is a Stein coupling. Note that our Theorem 1 2 . 1 1 can yield informative bounds 
if the random vectors are locally dependent, but uncorrelated, in cases where 
the general theorem of IRinott and Rotar / (jl99rf ) would not yield informative 
bounds. 
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